23 - Deep Learning - Plain Version 2020 [ID:21119]
50 von 118 angezeigt

Welcome everybody to deep learning. So today we want to look into further common practices

and in particular in this video we want to discuss architecture selection and hyperparameter

optimization.

And you know nothing in machine learning is exact right?

However the test data is still in the vault. We are not touching it. However we need to

set our hyperparameter somehow and as you've already seen there's an enormous amount of

hyperparameters.

You have to select an architecture, a number of layers, number of nodes per layer, activation

functions and then you have all the parameters in the optimization, the initialization, the

loss function and many more.

The optimizers still have options like the type of gradient descent, momentum, learning

rate decay, batch size, in regularization you have different regularizers, a 1 and a

2 loss, batch normalization, dropout and so on.

You want to somehow figure out all the parameters for those different kinds of procedures.

Now let's choose an architecture and a loss function.

The first step would be to think about the problem and the data. How could features look

like?

What kind of spatial correlation do you expect?

What data augmentation makes sense?

How will the classes be distributed?

What is important regarding the target application?

Then you start with simple architectures and loss functions and of course you do your research.

Try well-known models first and foremost.

They are being published and there are so many papers out there.

Hence, there is no need to do everything yourself.

One day in the library can save hours, weeks and months of experimentation.

Do the research, it will really save you time.

Often they don't just publish the paper, but in very good papers it's not just the

scientific result, but they also share the code, sometimes even data.

Try to find those papers.

This can help you a lot with your experimentation.

So then you may want to change and adapt the architecture to your problem.

If you change something, find good reasons why this is an appropriate change.

There are quite a few papers out there that seem to introduce random changes into the

architecture.

Later, it turns out that the observations that they made were essentially random and

they were just lucky or experimented enough on their own data in order to get the improvements.

Typically there is also a reasonable argument of why the specific change should give an

improvement in performance.

Next you want to do your hyperparameter search.

So you remember learning rate decay, regularization dropout and so on.

These have to be tuned.

Still the networks can take days or weeks to train and you have to search for these

hyperparameters.

Hence, we recommend using a log scale.

So for example for EDA you go for 0.1, 0.01 and 0.001.

You may want to consider a grid search or random search.

In a grid search you would have equal distance steps and if you look at reference 2, they

have shown that a random search has advantages over the grid search.

First of all it's easier to implement and second it has a better exploration of the

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:09:18 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 13:56:47

Sprache

en-US

Deep Learning - Common Practices Part 2

This video discusses the use of validation data and how to choose architectures and hyper parameters and discuss ensembling.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten
Wordpress FAU Plugin
iFrame
Teilen